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Abstract 

We present a method of constructing low-dimensional nonlinear models describing the main 
dynamical features of a discrete 2D cellular fault zone, with many degrees of freedom, em- 
bedded in a 3D elastic solid. A given fault system is characterized by a set of parameters 
that describe the dynamics, rheology, property disorder, and fault geometry. Depending on 
the location in the system parameter space we show that the coarse dynamics of the fault can 
be confined to an attractor whose dimension is significantly smaller than the space in which 
the dynamics takes place. Our strategy of system reduction is to search for a few coherent 
structures that dominate the dynamics and to capture the interaction between these coherent 
structures. The identification of the basic interacting structures is obtained by applying the 
Proper Orthogonal Decomposition (POD) to the surface deformations fields that accompany 
strike-slip faulting accumulated over equal time intervals. We use a feed-forward artificial 
neural network (ANN) architecture for the identification of the system dynamics projected 
onto the subspace (model space) spanned by the most energetic coherent structures. The 
ANN is trained using a standard back-propagation algorithm to predict (map) the values of 
the observed model state at a future time given the observed model state at the present time. 
This ANN provides an approximate, large scale, dynamical model for the fault. The map can 
be evaluated once to provide short term predictions or iterated to obtain prediction for the 
long term fault dynamics. 

1 Introduction 

A first principles approach to modeling and forecasting the dynamics of an earthquake fault is not feasible at 
present because the governing physical laws, geometric and structural fault properties, and controlling vari- 
ables (fault stresses and slips) are not fully available. A practical alternative is to build "phenomenological" 
models that attempt to estimate the overall character of the system's dynamics. These models quantify basic 



deterministic or stochastic relationships involving only a few irreducible degrees of freedom, which may be 
used for short-term prediction. We show that for earthquake forecasting this can be done using the spatio- 
temporal strain patterns embedded in the observable surface displacements. Such an approach is based on 
the observation that the large scale dynamics of the system often evolves on a manifold (or invariant measure 
for the case of strange attractors) with a dimension that is significantly smaller than that of the system's 
phase space. In such cases, a few macroscopic observables can approximate very well the present state of the 
system and predictive models based on the dynamics of a reduced number of macroscopic observables can 
then be constructed. 

In order to identify meaningful low-dimensional structures, we apply the Proper Orthogonal Decompo- 
sition (POD) - also known as the Principal Component Analysis (PCA) or Karhunen-Loeve expansion - to 
an ensemble of surface deformation data generated by the system's dynamics. The POD provides the most 
efficient way of capturing the dominant components of a dynamical process with only finitely many, and 
often surprisingly few, "modes" 17 . Using synthetic calculations for a strike-slip fault system, we show 
that a reduced number of deformation modes can explain on the average the large scale dynamics of elastic 
surface deformations. The dynamics of these modes live on a low-dimensional "reduced" attractor in the 
neighborhood of which the system spends most of its time. The state of the system in this reduced space - 
model state space - is represented by a set of modal coefficients that measures the projection of the ground 
surface deformation onto each dominant mode. Our goal is to extract the nonlinear modal dynamics in this 
model space by constructing a map of the observed model state at a future time given the observed model 
state at present time. We will present preliminary results in which an artificial neural network has been 
used with promising success to learn the dynamics of the reduced model from these modal time series. The 
reconstructed map has been iterated to obtain predictions for the long term model dynamics starting from 
its current model state. A rough test of the reliability of the model forecast to approximate the future of the 
fault system is also discussed. 

Our method may be compared to the linear pattern dynamics introduced by Rundle and his coworkers |25| . 
Their technique is based upon a Karhunen-Loeve expansion of the spatio-temporal scismicity data and is used 
to estimate a linear stochastic model for the evolution of a probability density function for seismic activity. 
In contrast to that method, which provides a local linear approximation in a probability space, we propose a 
global nonlinear approximation that describes the effective large-scale dynamics in a low-dimensional phase 
space. 



2 Description of the Earthquake System 

We distinguish between the assumed physical system and the model which is an approximate representation 
of the system. The assumed system is itself a simplified representation of fault systems in the earth, but will 
be considered a valid description of reality if its behavior is typical for the dynamics of an isolated fault. Our 
goal is to build data driven models, with useful predictive skills, which approximate the large scale dynamics 
of the system. If this is possible, we can conclude that the methods we propose can then be extrapolated to 
real isolated faults and, perhaps, even to complex fault systems. 

Our assumed system corresponds to a discrete strike-slip fault of length L = 70 km and width W = 17.5 
km embedded in a 3D elastic continuum [5] . The fault consists of a uniform grid of dynamical cells where 
slip is governed by static/kinetic friction processes, surrounded by regions with imposed constant slip rate 
of V p i = 35 mm/year, representing the tectonic loading. We divide the 70 km x 17.5 km computational 
grid into 128 x 32 square cells, with length Aa; and depth Az having equal dimensions of approximately 
550 m, that corresponds to the dimension of small, effectively disconnected, slip patches in mature fault 
zones [5]- The brittle deformation at any fault position and time is governed by quasi-static, 3D elastic 
dislocation theory and spatially varying "macroscopic" constitutive parameters that describe the static and 
kinetic friction processes. The stress at any fault position (cell) increases with time, t (measured in years), 
due to the gradual tectonic loading and the time-dependent brittle deformation at other fault locations. It 
can be written as a boundary integral over the slip deficit <pki = V p it — um, 



where is the elastic stress transfer function based on the solution of Chinnery |10j . Uki is the right 

lateral slip at fault cell (kl) ( measured in mm), and the summation is over the brittle area, B =70 km x 17.5 
km, of the fault plane. The cell indexes i, k and j, I along the horizontal and vertical directions, respectively, 
define the spatial cell location. We assume that the static brittle strength is uniform along the fault and is 
set to a value of t s = 100 bars. If the stress Ty reaches the static strength t s , a brittle failure occurs at this 
location and the cell slips an amount Aity, till the local stress drops to a prescribed arrest stress level, r 0) »j. 
The stress transferred from the failed cell can lead to subsequent brittle failures (i.e., rupture propagation) 
if the stress anywhere increased to the failure threshold. These failures may, in turn, induce or reinduce 
more brittle slip events. After an initial slip event the strength drops to a dynamic level, r^y < T s ,ij, and 
reinitiation of brittle slip on an already failed cell occurs when Ty > r^y there. The brittle slip associated 
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with a stress drop can be described by 

Auij = lLi^ii , (2) 

where -Fij.ij is the self stiffness of a cell and the failure stress r/^j equals the static failure stress, r s , for a cell 
which has not slipped or is adjusted to its dynamic value, T^ij, every time a cell has failed. Figure ^ shows a 
schematic stress-time and stress-slip history for an individual cell. During an iteration loop, all failing cells 
are identified and their slips updated according to Eq. At the end of each iteration the stress field is 

updated everywhere according to Eq. The iterations end when there are no more brittle instabilities. 

This marks the end of an earthquake event whose strength is measured by its potency P defined ,7] as the 
integral of slip over the rupture area, 

P = J2 An io AA . (3) 

ij 

where AA — Ax x Az = 0.29 Km 2 and Au^ is the total slip at cell (ij). During a seismic event the tectonic 
slip, Vpit, remains constant. At the end of each event the failure strength everywhere on the fault recovers 
back to t s . In order to initiate the next seismic event we advance in Eq. Q the tectonic slip (or equivalently, 
advance the time t) till the fault cell closest to failure reaches its static failure strength. This triggers a new 
seismic event and starts again the failure iteration loops. 

An important component of the fault rheology is the distribution of arrest and dynamic stresses along 
the fault. In the simulations used in this paper we chose random stresses, 

T a ,ij =< r a > +A£ , (4) 

where < r a > is the fault-averaged arrest stress, £ is a random number uniformly distributed in the range 
[—0.5,0.5], and A is the noise amplitude. The static strength, dynamic strength, and arrest stress are related 
to each other, at each point on the fault, as 

D= r s -r a , ij ^ (5) 

Ts Td,ij 

where D is a dynamic overshoot coefficient. Its inverse, 1/D, proportional to r s — T^-y, is a measure of 
the dynamic weakening that characterizes static/kinetic friction models. The arrest and dynamic stress 
distributions do not evolve in time (quenched heterogeneities) and the fault dynamics is deterministic. We 
also assume no observational errors. We record the fault evolution after all transients have died out and the 
dynamics have reached a statistical steady-state (all moments of the stress and displacement fields become 
time independent). A thorough motivation of the fault system, based on a wealth of observations and 
numerical results, is provided in a series of papers by Ben-Zion and Rice 00 El) while a detailed description 
of the dynamics can be found in [21 El El • 



3 Qualitative dynamics of slips, stresses, and seismicity 

We start by analyzing the evolution of seismicity and the dynamics of the first and second order moments of 
the stress and slip fields. Here we only investigate the dynamics of two fault realizations that differ largely in 
the value of the dynamic overshoot coefficient. System A with D = 1.5 has a large dynamic weakening, while 
system B with D = oo has no dynamic weakening. Both systems have random arrest stress distributions, as 
in with < r a >= 80 bars and A = 20 bars amplitude. A detailed investigation of the dynamics, covering 
the system space between these two limits, will be presented elsewhere In general, each system observable 
contains a different amount of information for understanding the underlying dynamics. As discussed below, 
for prediction purposes we are only interested in the dynamics at the large length and time scales. As the 
small scale dynamics overwhelmingly dominates the statistics, it is not obvious a priori which observable 
provides information about the large scale dynamics. In Fig. [2] we present the time evolution of the potency 
released over a 30 years time interval. The fault motion in system B (D = oo) seems to be dominated 
by erratic, random appearing evolution. There are no detectable time patterns. When significant dynamic 
weakening is present, as in system A (D — 1.5), a clear signature of a quasi-regular evolution emerges. This 
can be identified in the presence of very large events (these are large scale events in which a large fraction of 
the fault slips), followed by intervals of seismic quiescence. There is a quasi- regular build up of correlations |S] 
which ultimately generates very large events, much larger in size than in system B. Although the evolution 
of seismicity is dominated by small events, there are significant evolution patterns reflecting the dynamics 
of the large scale events. These can be better seen in Fig. [21 which shows the evolution of the average slip 
deficit, 

0(*) = jjjfX)(M*)-<^i>) , (6) 

and slip deficit variance, 

<K*) = ^E(<M*)-<K*)) 2 • (?) 

Due to the inhomogeneity of the slip deficit, the slip fluctuations are measured from a local time average, 
< 4>ij >, defined as 

I T 

t=i 

where the length of the averaging time used is T = 100 years. For the D = 1.5 system we observe large 
seismic cycles and smooth long term behavior between two consecutive large events. At the scale of the plot 
in Fig. 13 the erratic and unpredictable small scale dynamics is unobservable. Magnifying the plot will reveal 
small scale jitters superposed over the smooth large scale motion. This suggests a clear separation of time 
scales between the slow, large scale motions and the fast, small scale motions. In contrast, the D — oo system 



has small oscillations and noisy behavior; note the large difference in the scale of the variance evolution for 
these two systems. For the D — oo system there are no large scale features and the dominant signature of 
small scale events renders the system higher dimensional. Additional illustrations of the dramatic change in 
the dynamics for cases with and without dynamic weakening are given in [Sj. 

The emergence of a clear separation of length scales is also revealed in the frequency-size statistics shown 
in Fig|l| The D = oo system has a power-law distribution with an exponential cutoff, characteristic of 
systems close to a critical point. This confirms the analysis of Fisher et al. who showed that the 

dynamic weakening r s — Td,ij is a tuning parameter and found that the system has an underlying critical 
point of second order phase transition at zero dynamic weakening (D = oo) and full conservation of stress 
transfer during failure events. The frequency-size statistics for the D = 1.5 system shows the existence of 
two separate earthquake populations. At small scales we observe a power-law distribution of event sizes. The 
other events cover a broad range of scales and their frequency is significantly higher than an extrapolation 
based on the power-law statistics of the small events. This behavior is reminiscent of the characteristic 
earthquake model [HDl Ej , which has a distinct probability density peak at the large size end of the frequency- 
size distribution. Moreover, a careful analysis [2] has shown that most stress dissipation is produced at the 
large size end of the potency spectrum and is due to the large scale motions. Averaged over small time 
intervals, the small events that dominate the frequency-size histogram do not produce any stress dissipation 
on average and only transfer stresses internally. This suggests a possible description of the dynamics in terms 
of large scale dissipative motions, that balance the stress increment due to an effective loading rate, which 
absorbs most of the small scale events into a renormalized plate velocity. This coarse-grained dynamics is 
inherently of lower dimension than the original dynamics. The question is whether the dimension collapse is 
significant to render data driven model reconstruction algorithms practical. 

4 Effective dimension of the large scale motions 

For the class of systems discussed in this paper the knowledge of the slip deficit, (f>ij, where i = 1, . . . , N x 
and j = 1,...,N Z , fully specifies the state of the fault; the number of microscopic degrees of freedom is 
M = N x x N z and the state of the system is completely represented by a vector in a finite, but high- 
dimensional vector space R , where M — 4096. We conjecture that a finite dynamic weakening favors the 
creation of spatial and temporal correlations that collapse the asymptotic, long time dynamics of the fault 
onto an attractor of a smaller effective dimension to; m <C M. This is compatible with the recent analysis of 
Ben-Zion et al. 0. The physics of the dimensional collapse consists in the clear separation of time and length 



scales. In such cases, we can in principle isolate the large scale dynamics which, evolving slowly, will drive 
the rapidly relaxing small scale dynamics. But, are real faults characterized by a large dynamic weakening? 
Full elastodynamic simulations of rupture propagation in a 3D elastic solid [20] indicate that D ~ 1.25, 
corresponding to large dynamic weakening. Moreover, we have shown elsewhere |2] that this separation of 
scales holds for a broad range of D values and is, therefore, generic. Consequently, restricting our study 
to system A, we shall address for the rest of this paper the following issues (1) Whether we can build a 
predictive model for the large scale motions and (2) What effect the neglected small scale motions have on 
the predictability of the larger scales. 

In order to model the large scale motions we have to find first if their dynamics is deterministic in character. 
In principle, the coarse grained dynamics of any deterministic system is stochastic. When a separation of 
scales exists, we can assume that only the statistical properties of the small scale motions influence the larger 
scales, through coefficients of turbulent viscosity, and that at any moment these statistical properties are 
determined by the larger scale motions |19j . A model describing only the large scale dynamics is assumed 
to be effectively deterministic. However, we have to be careful, because, as pointed out by Lorenz |19| . 
there remains uncertainties in these statistics, and their progression to the very large scales will ultimately 
produce large scale errors and limit the forecasting skill of our model. We will therefore assume, as a working 
hypothesis, that a model describing only the larger scales motions is effectively deterministic. In order to 
test our assumption and find an upper bound on the dimension of the model, we start with a phase space 
analysis of the large scale motions as is reflected in the scalar dynamics of the slip deficit variance. 

The key to unraveling the dynamics from a scalar time series is to reconstruct a vector space, formally 
equivalent to the original attractor of the dynamics. For a deterministic system, Takens |18| embedding 
theorem guarantees that such a reconstruction is possible from a time-delay reconstruction of only one scalar 
observable. In our case, the embedding is performed using Cartesian coordinates made out of the observed 
slip deficit variance and its time delayed copies, 

X„ = [a(n),cr(n — d), . . . ,a(n— (m - l)d)] , (9) 

where <j{n) = a(nAt) is the slip deficit variance measured at equal sampling times At = 0.1 years, d is 
a time lag (an integer multiple of the common lag At), and m is an embedding dimension. The optimal 
time lag, d = 3, has been chosen at the first minimum of the time-delayed mutual information as suggested 
in |12j and motivated in detail in |18j . The problem is to determine the optimal embedding dimension m, 
for which the reconstruction by delay vectors provides an acceptable unfolding of the attractor so that the 
orbits composing the attractor are no longer crossing each other in the reconstructed phase space pp. The 



practical solution is to look for false neighbors in the embedding phase space at a given value of m |18| . To 
understand this concept, consider the situation that an m dimensional delay reconstruction is an embedding, 
but an (m — 1) dimensional delay reconstruction is not. If the embedding dimension is too small to unfold 
the attractor, a small i? m_1 neighborhood will contain points that belong to different parts of the original 
attractor. Therefore, at a later time, the images of these points under the system's dynamics will split onto 
different groups, depending on which part of the attractor the points are originally coming from. This lack 
of a unique location of all the images in (m — 1) dimensions is reflected in finding false neighbors, meaning 
that determinism is violated. When increasing to, starting with small values, one can detect the minimal 
embedding dimension, m e , by finding no more false neighbors. Then, we can invoke the result of Sauer et 
al. |27| who showed that the attractor formed by X„ in the embedding space is equivalent to the attractor 
in the unknown space in which the system is living if m is larger than twice the box counting dimension, do, 
of the attractor. Often, an m = d a embedding dimension is enough for unfolding the attractor. Therefore, 
the minimal embedding dimension sets the following bounds, (m e — l)/2 < do < m e , for the dimension of 
the attractor in the true phase space. Figure shows that the percentage of false neighbors for the D = 1.5 
system falls under 1% when the phase space embedding of the slip variance data reaches dimension seven. 
This behavior suggests a topological dimension in the true phase space between three and six. We note that 
the percentage of false neighbors is not completely reduced to zero. The explanation for this behavior is 
discussed in Section 6. To appreciate this result, we show for comparison in Fig. [SJthe behavior of the system 
in the limit of no dynamic weakening. Even for an embedding dimension as high as ten, the percentage of 
false neighbors does not drop below 4%, and is an order of magnitude larger than the percentage for the 
D = 1.5 system at the same embedding dimension. This confirms our previous analysis and proves that in 
the limit of low dynamic weakening the system has high dimensional dynamics. 

The foregoing analysis shows convincingly the deterministic structure of the large scale motions: for 
stochastic or very high dimensional dynamics the number of false neighbors will never effectively drop to 
zero. The low-dimension result is consistent with the embedding theorem |27| which asserts that it is not the 
dimension of the underlying true phase space that is important for the minimal dimension of the embedding 
space, but only the fractal dimension of the support of the invariant measure generated by the dynamics in 
the true phase space |18j . These results justify our earlier remarks that due to some collective behavior only 
a few dominant degrees of freedom remain and our conjecture that their dynamics is effectively deterministic. 

Before addressing the problem of identifying these collective degrees of freedom, we would like to know 
if their dynamics is chaotic and, if so, what are the implications for their predictability. To answer this 
question we compute the maximal Lyapunov exponent. This exponent measures the exponential divergences 



of nearby trajectories and is an average of these local divergences over the whole data. A positive maximal 
Lyapunov exponent is a signature of chaos. Its computation is based on the algorithm described by Kantz 
and Schreiber JS] and uses the implementation provided in the publicly available software package TISEAN 
16 (all nonlinear time series algorithms described in this paper use the same excellent implementation). For 
a point X n of the time series in the embedding space, the algorithm determines all neighbors X„< within a 
neighborhood U n of radius e. Then, for each neighbor, we consider the distance 6(0) — |X n — X„<| and we 
read its evolution 6 (An) = |X„ + An ~ X„/ + An| from the time series. If for all neighbors 6 (An) ~ 6(0)e XAn , 
then A is the local divergence rate. The local divergence rate varies along the attractor and the true maximal 
Lyapunov exponent is an average over many reference points n. Therefore, in practice we compute the 
average over the distances of all neighbors to the reference part of the trajectory as a function of the relative 
time An. The logarithm of the average distance at time An measures the effective expansion rate over the 
time span An: 



where the outer bracket denotes the averaging over the reference point X„, |Z7 n | is the number of neighbors 
in U n , and the argument of the logarithm is the average local expansion rate. If S(An, e, to) vs An exhibits 
a robust linear increase with identical slope for a range of e values and for all to larger than some minimum 
embedding value, then its slope is an estimate of the maximal Lyapunov exponent X max per time step |18|. 
Figure El shows three bundles of curves corresponding to three neighborhood sizes, e = 0.377, 0.671, and 
1.19, as can be seen for An = 0. Each bundle shows the behavior of the expansion rate for to = 6,7,8,9 and 
10 and proves that the result is robust to changes in e and does not depend on the embedding dimension 
when to is large enough. Our estimate for the maximal Lyapunov exponent is X max = 0.8 years -1 . We 
can therefore conclude that the large scale motions have a positive maximal Lyapunov exponent and exhibit 
sensitive dependence on the initial conditions. Does this dependence sets a fundamental limitation to long- 
term forecasting? Two initially close trajectories will diverge exponentially in the phase space with a rate 
given by the largest Lyapunov exponent X max . Therefore, for any finite uncertainty 6 in the initial conditions, 
we can forecast the future state of the system only up to a maximum time, 



where A is the accepted tolerance level. For the earthquake system, an approximate estimate for the pre- 
dictability time is T p — 1/X m ax — 1-25 years. This result is disappointing and, due to expected model 
reconstruction errors, which might overshadow the uncertainty in the initial conditions, raises questions on 
the value of any forecasting endeavor. But this estimate is naive and holds only for infinitesimal perturbations 




(10) 



and in nonintcrmittent systems. Generally, the predictability time is scale dependent jS] and can be much 
longer than the rough estimation T p ~ 1/X max - As we will shortly see, at the large length scale of interest 
for forecasting, a better estimation of the predictability time is T p ~ 10 years. 

5 Proper Orthogonal Decomposition 

The dynamics of the first and second order moments of the slip and stress fault fields provides an adequate 
description of the large scale motions. Unfortunately, their measurement poses very difficult problems: the 
stress field is not directly observable while the slip field requires the solution of an ill-posed inverse problem. 
Therefore, we propose an alternative analysis of the large scale motions based on the spatio-temporal strain 
patterns embedded in the surface deformation fields that can be accurately measured by InSAR and GPS 
observations |2|. Assuming uniform slip discontinuities over each rectangular cell, the components of the 
surface displacement vector U a are found to be |22j . 



where r a (x; x^ ) is the surface displacement in the direction a = x, y, z at the observation point x = (x, y, 0) 
due to uniform unit slip on the fault cell (ij) located at Xy = (i, 3 -, 0, Zij). The space-time signal is obtained 
by simultaneous measurements of the surface displacements on a 64 x 32 uniform rectangular grid which 
covers a 100 km x 50 km surface area centered around the fault. For each surface deformation we compute an 
ensemble of N snapshots, U a (x, n) — U a (x, nAt), n = 1, . . . , N, and a = x,y, z , every At = 0.1 years. The 
analysis of each deformation direction proceeds identically, therefore we will henceforth drop the deformation 
index a. Since we are only interested in decomposing the dynamics of fluctuations, it is convenient to separate 
the flow U(x, n) into a time-independent mean and a fluctuating part, i.e. 



where < u(x, n) >= and < • > indicates the ensemble average. 

The identification of the active degrees of freedom in the surface deformation fields uses the Principal 
Orthogonal Decomposition (POD) [Tf\. The POD seeks an optimal representation (in the least square sense) 
for the members of the ensemble {it(x, n)}^ =1 . It searches for generalized directions <fr(x) in the configuration 
phase space, such that most of the ensemble fluctuations will be directed along </>(x). Mathematically, this is 
achieved by maximizing the average projection of the {u(x,n)} ensemble onto </)(x), i.e. 




(12) 



U(x,n) =< U(x) > +u(x,n) 



(13) 



max < (u, (h) 2 > 

||0|| = 1 



(14) 



where (•, •) is the L 2 inner-product in the configuration space and the L 2 norm constraint, \\<j)\\ — 1, is required 
for the maximum to be defined. The solution to this variational problem is given by the eigenfunctions {</>i}i f 
of the following integral equation J7| , 

if(x,y)^(y)dy = Ai0i(x) , Ai > A 2 > . . . > A M > , (15) 

whose kernel K(y:,y) is the two point correlation matrix of the ensemble, K(x, y) =< u(x)u(y) >. There 
are at most M eigenfunctions corresponding to the total number of degrees of freedom. The eigenvalues Ai 
measure the mean square fluctuations of the ensemble in the directions defined by their corresponding eigen- 
functions: Ai =< (u,4>i) 2 >. We can also think of Ai as the average "energy" of the ensemble fluctuations 
projected onto the <pi axis. Therefore, ranked in decreasing order of their eigenvalues, the eigenfunctions 
(sometimes called empirical eigenfunctions, coherent structures, or dominant modes) will identify the domi- 
nant directions in configuration space along which most of the fluctuations take place. The first four modes 
describing the predominant motions of the u x deformation field for system A with D = 1.5 are shown in Fig. 
The coherent, collective nature of the fluctuations represented by these modes is eloquently expressed by 
their spatial structure. The spatial coherence sharply decreases for the higher order modes (not shown). The 
modes probe the system at different scales, from the largest scales, captured by the most dominant modes, 
to the smallest scales, described by the higher order modes. This observation is hardly surprising, since for 
a translationally invariant system the POD modes are simply the Fourier modes |17) . 
The basis {4>i}f=i is optimal in the sense that any reduced representation of the form 

m 

m(x, n) ~ y^Aj(n.)(fo(x), m<M , (16) 

i=l 

describes typical members of the ensemble better than any linear representation of the same dimension m in 
any other basis: the leading m POD modes contain the greatest possible "energy" on average ^7j. Therefore, 
due to the coherence and optimality of the POD basis, Eq. I|16(l provides the most efficient Euclidean (linear) 
embedding for the large scale motions of the system. Moreover, the modal coefficients are uncorrelated on 
average, i.e. 

< AiAj >= XiSij , (17) 

which reflects the fact that orthogonality in the embedding space, (<f)i,(j)j) = 5ij, is related to the statistical 
properties of the time series. 

The choice of the embedding dimension m is based on the computation of the cumulative normalized 
eigenvalue spectrum, A m , defined as: 

A» = kHr ■ (18) 



The cumulative spectrum can help us define an effective POD embedding dimension by finding the minimum 
number of modes needed to capture some specific fraction / < 1 of the total variance of the data: 

dpoD = argmin{A m : A m > /} . (19) 

m 

Figure [S] shows the cumulative normalized spectrum for each of the surface deformation modes. For system 
A (D = 1.5), in order to explain / = 95% of the variance in the u x , u y , u z data sets we need only 2, 3, and 
5 modes respectively. This is consistent with our earlier estimates of the embedding dimension. In contrast, 
for system B (D = oo), we need 9, 27, and 42 modes respectively. 

It is important to realize that the POD modes carry no dynamic information. Reshuffling the ensemble 
of configurations - and, therefore, destroying the dynamic content hidden in their time ordering - produces 
the same POD modes. Therefore, in order to identify the dynamics of each POD mode we calculate the 
projections of the surface deformation fields onto the dominant deformation modes, i.e., 

A-(n) = («(x J n),0 i (x)) . (20) 

This way, we generate a small number of modal time series Ai(n),i — 1, . . . ,m, that encode the evolution, 
interaction and dynamics of the spatial modes |17j . They encapsulate the projection of the system's dynamics 
onto the m-dimensional model space defined by Eq. Ijltjfl ■ 

In Fig. [5]we represent the modal time series (red lines) corresponding to the first four u x spatial modes for 
fault system A. The blue lines describe the time evolution of the binned potency released (the total potency 
released over the time interval At). For the first time series, Ai(n), the location in time of the amplitude 
jumps coincides with the time of large events (within the temporal resolution defined by At). Moreover, we 
can show that the size of the amplitude jumps is proportional with the size of the event, i.e. AAi(n) = 
Ai(n + 1) — A±(n) ~ P m ax{n), where P m ax(n) is the largest event in the n'th time interval. Therefore, if we 
can model the evolution of this mode, the time and size of the amplitude jumps will give us useful information 
about the time and size of the large earthquake events. Moreover, we notice that the higher order modes 
evolve on faster time scales, as is already noticeable from the faster decay of the temporal autocorrelations 
shown in Fig. 1771 

The picture that emerges from this decomposition evolves gradually from the slow, coherent large scale 
motions, whose dynamics is captured by the most dominant POD modes, to the fast, incoherent small scale 
motions described by the higher order modes. In Section 7 B we describe how a neural net can be used to 
process these time series in order to extract a low-dimensional nonlinear dynamic model with short-term 
predictive capabilities. 



6 Geometry of the attractor in the POD basis 

The POD modes provide an optimal basis for the phase space embedding expressed in Eq. I|16(l . The phase 
space vectors in this reconstruction, X„ = (Ai(n), . . . , A m (n)), approximate the location of the system on 
the attractor at discrete time moments. Compared to the time-delay embedding, the POD basis unfolds the 
attractor with increased resolution as the embedding dimension increases. This property allows us to probe 
the structure of the attractor at different length scales by computing its correlation dimension. This measure 
was introduced by Grassberger and Procacia |14| to quantify the self-similarity of geometrical objects. It is 
based on the definition of the correlation sum for a collection of points X„ in some vector space R m . This 
is the fraction of all possible pairs of points which are closer than a given e in a particular norm. The basic 
formula is 

2 R n 

° M = R(N-2W-1) ^ £ e(e-||X i -X i || 00 ) , (21) 

i=l j=l,\i-j\>W 

where we use the maximum norm, || X ||= max \xi\, O is the Heaviside step function, TV is the total number of 
data points, and R is the total number of reference points. The sum counts the pairs (Xi, Xj) whose distance 
is smaller than e. Note that data points within a time window 2W of any reference point j are not included 
in the correlation sum. An appropriate choice of W — 100 reduces the spoiling effect of autocorrelations 
from the correlation sums as suggested by Theiler |29) . In the limit of an infinite amount of data and small 
e, the attractors of deterministic systems show power law scaling, i. e., C(e) ~ e d2 , and we can define the 
correlation dimension di by 

d,2 = lim lim d,2(m,e) . (22) 

For each embedding dimension m, d,2(m, e) measures the local slopes of the correlation sum at different length 
scales e, 

, , . <91nC(m, e) ,„ . 

d 2 (m,e)= d lne ■ (23) 

In Fig. rTO| for system A on the left and B on the right, we show the e dependence of d2(m, e) for embedding 
dimensions m = 1,...,20. This plot allows the identification of a scaling range and estimation of the 
correlation dimension if such range occurs. This is reflected in the presence of a plateau in the e dependence 
of d2{m,e) which does not change much with embedding dimension m when m > do: the correlation sum 
probes the attractor at different length scale and tests for scale-invariance. 

From the point of view of the physics involved, we distinguish four different types of behavior for ^(m, e) 
at different regions of length scale |T^]. For small e and large m (region I in Fig. I10|) the lack of data points 
is the dominant feature and the values of d2(m,e) are subject to large statistical fluctuations. If e is of the 
order of the size of the entire attractor (region IV), no scale invariance can be expected. In between, we can 



distinguish two regions. In region II, at small length scales but good statistics due to the small embedding 
dimension, the reconstructed points reflect the low amplitude and high frequency dynamics generated by 
the small events. At these length scales the dynamics is high dimensional and the reconstructed points fill 
the entire phase space available, therefore we expect d^im, e) ~ to. Up to the embedding dimension m = 4 
this estimate is recovered in the limit e — > 0. For increased embedding dimension there are large statistical 
fluctuations due to the lack of neighbors. 

For system A, in region III located at larger length scales between regions II and IV, the dynamics evolves 
on a low-dimensional self-similar attractor whose presence is detected by the plateau in the correlation 
dimension at d%{m, e) ~ 1.3. Note that for system A the scaling regime, e — > 0, is not reachable. The 
breakdown of scaling at the smaller length scales is dynamic in nature and is not due to the lack of good 
statistics. The closer we look at the system, the more degrees of freedom become visible and the dimensionality 
of the dynamics is higher than our largest embedding dimension. This also explains why the percentage of false 
nearest neighbors never drops completely to zero with increased embedding dimension. On the same figure, 
within the range of embedding dimensions that were numerically accessible, system B shows no signature of 
a large scale dimension collapse, in dramatic contrast with the behavior of system A. 

The length dependent dimensions lead us to conjecture that we observe different subsystems on different 
length scales. From the self-similar and low-dimensional structure at the large length scales the structure of 
system A attractor crosses-over to a high dimensional, stochastic like structure when probed at very small 
length scales. Our strategy for building a model with good forecasting skills should be obvious now. The 
underlying idea is to approximate the large scale motions of system A through a low-dimensional embedding 
into the subspace spanned by the most dominant POD modes and to extract a finite-dimensional model in 
the form of a set of ODEs of comparable dimension. Does a good nonlinear model for the large scale motions 
exists? The answer depends on how rapidly the small scale motions of the true system progress to reach the 
larger length scales. If this time is short, an effective model for the large scale motions might not exist. If the 
growth time is sufficiently long, the existence question is well posed and a model for the large scale motions 
might exist. To address this question we now proceed to the model reconstruction task. 

7 Model Reconstruction and Short-Term Earthquake Forecasting 

Using the POD decomposition we have identified a low-dimensional linear space in which system A evolves 
most of the time and we have reduced its dynamics to a small set of time series, Ai(n), i = 1, ...,m, 
describing the evolution of the system in this reduced linear space. We now face the problem of determining 



the underlying dynamical process from the information available in these time series. They are assumed 
to be governed by a nonlinear set of ODEs and our modeling approach relies on the ability to identify an 
approximate m dimensional model, 

Ai(n+1) = F i [A 1 (n),A 2 (n),...,A m (n)] i = l,...,m , (24) 

that describes an explicit Euler approximation to the evolution and interaction of the spatial modes. To 
identify this nonlinear mapping we employ an artificial neural network (ANN) . Generally, the neural network 
approach is used as a "black-box" tool in order to develop a dynamic model based only on observations of the 
system's input-output behavior 23,24. In the learning process the network adjusts its internal parameters to 
minimize the squared error between the network output and the desired outputs. A typical learning method 
is the error back-propagation algorithm which is a first order gradient descent method 15 . 

All reconstruction results described here refer to the dynamics of the u x surface deformation modes 
for model A with D = 1.5. We found it difficult to identify a simple model having the structure defined 
by Eq. I|24|) . In order to improve the model forecasting skill it is useful to enlarge the structure of the 
model to include information about the past history of the modes. While the best model structure is still a 
subject of investigation, an analysis of the time patterns present in the modal evolution provides a partial 
understanding of this result. As Fig. [5] shows, one essential feature of the modal dynamics is the presence 
of two time scales: within each earthquake cycle there are intervals of slow and fast motions with detectable 
quasi-regular behavior. This is reflected in the two-point autocorrelation functions (Fig. Illfl defined as 

Cu{T) = <Mn) >2 ' (25) 

where the < • > denotes the time average. The resulting plots exhibit oscillatory behavior with a slow 
amplitude decay over a longer time scale indicating that the system has two correlation time scales. We have 
observed that providing the ANN with information about these long-term correlations of the modal dynamics 
produces nonlinear models with increased forecasting skill. To include this information, we have modified 
the structure of the model in Eq. I|24|) and replaced the modal coefficients At with time-delayed vectors Xj , 

Xi = {A i {n),A i {n-l),...,A i {n-d),A i {n-2d),...,A i {n-{K-l)d)) , (26) 

" v v ' 

short-time memory long-time memory 

where d is the delay and (K — 1) is the number of the time-delay intervals. Due to the presence in the modal 
dynamics of two different time scales, we have included both short, Ai(n), Ai(n — 1), . . . ,Ai(n — d), as well 
as long time, Ai(n — 2d), . . . , Ai(n — (K — l)d), memory information in the time-delayed input vectors. The 
dimension of each time-delayed vector is K + (d — 1). The ANN output then provides a prediction of the 



mode amplitude Ai at time (n + 1), 



A i (n + l) = F i [X 1 (n),X 2 (n),..., 



X m (n)] i = l,...,m 



(27) 



based on input information describing the past mode histories. 

The ANN is trained using a standard back-propagation algorithm [T^. When training succeeds, the 
ANN provides an approximate dynamical model (map) for the large scale motions of the fault. The map 
can be evaluated once to provide short term predictions or iterated to predict the long term fault dynamics. 
Besides the parameters describing the structure of the ANN (nonlinear transfer functions, number of layers 
and neurons in each layer), the model structure itself has many parameters (m, d, K) that can be adjusted in 
order to improve the model performance. Our goal is to find a good model describing the evolution of the first 
POD mode, whose dynamic discontinuities trace accurately the time and size of the large events. Information 
about the higher order modes and their past histories is only necessary to uniquely determine the state and 
model trajectory in its phase space. One of the best models found so far describes the evolution of the first 
two u x surface deformation modes: m — 2. For each spatial mode, a time-delayed vector with parameters 
K = 6 and d = 6 was used. Because the dimensionality of the input space, m(K + (d — 1)) = 22, is higher 
than our estimated embedding dimension, we have decided to perform a POD analysis of the ensemble of 
input vectors. This time, the POD decomposition performs the analysis of the dominant temporal patterns 
that are created by the modal dynamics. Similar to the spatial decomposition, we first compute the two 
point correlation matrix of the ensemble, 



where i, j = 1, . . . , m denote the input modes, r, ,s = 1, . . . , K + id — 1) are the components of the time- 
delayed mode vectors, and < • > is the average over the ensemble of input vectors. Due to the statistical 
independence of the modal coefficients, Eq. (|17|l . the correlation matrix is to a good approximation block 
diagonal, i.e. Ki r j s ~ for i ^ j (any modal cross-correlations are due to finite size effects). The eigenvalues 
and normalized eigenvectors of the correlation matrix, 



(28) 





(29) 



will now provide an optimal representation of the dynamics of the spatial modes: 



m(K+(d-l)) 





(30) 



where 



m 



K+(d-l) 




(31) 



We found that 99% of the variance of the input ensemble (to = 2, K = 6 and d = 6 model) can be represented 
by the first six temporal modes \I/ fe ,fc = l,...,6. We have therefore chosen to approximate the input vectors 
presented to the ANN by their six dimensional POD projection B(n) = (B\(n), . . . ,Be(n)) onto the space 
spanned by the first six temporal eigenvectors. Denoting by Px^b this projection operator, the structure of 
the model has now become: 

A i (n + l) = F i [Px^ B [X 1 (n),X 2 (n)]] = F i [S 1 (n),...,S 6 (n)] i = 1,2 . (32) 

The best forecasting performance was obtained for an ANN with two hidden layers of 10 neurons each. The 
input to the network is the 6-dimensional projection B(n) and its 2-dimensional output, (A\(n+1), A 2 (n+l)), 
is the model forecast for the first two mode amplitudes at the next time step. Because the large events have 
long recurrence times, the sequential selection of the training set will be dominated by input-output pairs 
that only express the quasi-linear behavior between two consecutive large events: the ANN will learn a linear 
model and will fail to forecast the intrinsically nonlinear large earthquake events. Therefore, we have designed 
the training set to include enough information describing the dynamics around the scarce large events. The 
modal time series were first rescaled to evolve in the interval [0, 1]. We then classified as large events all events 
in which the first mode has a jump larger than 0.05 in the rescaled units. Next, we chose a time window 
W = Kd, and checked for the presence of a large event located at the center of this window, as it slides along 
the modal time series. If we found one, we called this a nonlinear window, and all the input-output pairs 
describing the evolution of the system inside this time window were included in the training set. For each 
nonlinear window, we have included training pairs from a linear window (a segment of the time series that 
did not have a large event). With this training set, the ANN adequately learned to model the linear as well 
as the nonlinear features of the dynamics. 

Due to non-convexity of the error surface in the network parameter space, we train many ANNs starting 
from different initializations of the neuron weights. To avoid ovcrfitting, the training process stops when the 
error on the validation set starts to increase - the validation sample is a subset of the training set which is 
not actually used in training. At the end of the training cycle each ANN provides a model for the system 
dynamics. To find the "best" model, we test the forecasting skill of each model realization using time series 
segments not included in the training or validation set. Starting from an initial configuration describing for 
each input mode the current amplitude and its past K + (d — 1) values, we iterate the ANN forward in time 
for a number of F steps. At each time step the output of the network was used to update and reconstruct the 
ANN input for the next time step. Assuming perfect initial conditions this procedure provides a trajectory 
whose forecasting accuracy is controlled only by the imperfections of our model. However, the reconstructed 



model is always an imperfect one and perhaps a perfect model describing the large scale motions does not 
even exists. Moreover, the current state of the system in the model space is always obscured by observational 
uncertainty. To make things worse, our knowledge of the spatial modes themselves is limited by the finite 
size of the ensemble of surface deformations. For example, the accuracy of the two-point correlations cannot 
exceed l/v~N, where N is the size of the ensemble: when we have only observations of finite duration, the 
"true" modes, describing the large scale motions, will never be exactly known. How can we then evaluate the 
model forecasting skill? A detailed answer is beyond the scope of this paper and our goal here is only to show 
that despite all these difficulties, and many others not mentioned here, our imperfect models can provide 
stable and robust forecasts. Our remarks here follow Smith j2H], to which we also refer for a clear discussion 
of uncertainty in initial conditions, model errors, ensemble verification, and predictability in general. 

For nonlinear systems, uncertainty in the initial conditions severely limits the utility of single deterministic 
forecasts. Internal consistency requires that all nonlinear forecast should be ensemble forecasts [2H1- In this 
approach to forecasting, a collection of initial conditions, each consistent with the observational uncertainty, 
are integrated forward in time. When the system evolves on an attractor, the selection of the initial conditions 
is far from trivial. For a perfect model ( a model that has the right dynamics and whose phase space is 
identical with the system's phase space) the members of the ensemble should be restricted to live on the 
system's invariant measure (attractor). Obviously, this is not a priori known. Unrestricted initial conditions, 
consistent only with the observational uncertainty, will extend into the full phase space and generate over- 
dispersive ensembles. When this is the case, the predicted probabilities will not match the relative frequencies 
as demonstrated by Gilmour Generally, all models, including ours, are imperfect and, therefore, a perfect 
ensemble and an accountable forecast method do not exists. Nevertheless, when the initial conditions are 
uncertain, ensemble forecasts are still required. A reasonable constraint would require the members of the 
ensemble to live on the projection of system's invariant measure (attractor) into the model phase space, 
although this choice is not necessarily optimal or even unique |28j . 

Assuming no a priori knowledge of the true attractor projected into the model phase space, we present 
in Fig. Elan unconstrained ensemble of trajectories evolving under the dynamics of the ANN model. Each 
trajectory starts from a different initial state that contains small Gaussian perturbations from the exact initial 
conditions and is integrated forward in time for F — 250 steps. We compare the ensemble evolution (blue 
lines) with the true evolution of the most dominant surface mode (red, thick line). The resulting ensembles 
of forecasts can be interpreted as a probabilistic prediction. The goal is to predict the time and the size of 
the jumps in the evolution of the first mode amplitude (which, as we have already discussed, corresponds to 
the time and size (potency) of large seismic events), and to estimate their forecast accuracy. Due to the time 



delay involved, the best time resolution of each trajectory cannot be in this case less than At = 0.1 yr. As 
the fault evolves to the next time step, 0.1 years later, we update the present state of the system and generate 
a new F step ensemble forecast starting from this new state. This procedure is intended to incorporate the 
information about the system and its current state as is continuously generated by new observations. 

Is clear from Fig. 1121 that even though we lack a perfect model or an optimal ensemble, all members of 
the ensemble forecast the incoming sequence of large events. The forecast uncertainty of the ensemble grows 
very slowly showing long time reliability. The members of the ensemble spread out at a rate that depends on 
the local nonlinear structure of the model. This rate gives a local estimate of the stability of forecasts made 
in this region of the model's state space. It also controls the time scale (predictability time) on which the 
ensemble members scatter along significantly different trajectories. We observe regions of large predictability 
time that coexist with regions of relatively short predictability time [S]. Significantly, the ANN has shown 
consistently the ability to improve its forecasting as the system approaches a large event. The predictability 
time is significantly larger than the microscopic expansion rate which is controlled by the maximal Lyapunov 
exponent. As hinted earlier, the predictability is scale dependent and the large scale motions are more 
predictable than the small scale motions. 

The predictability also depends on the location of the system on its true underlying attractor. But unlike 
the growth of initial uncertainty in model phase space, the local nonlinear structure of the system's attractor 
controls how rapidly the small scale motions, not included in our model, progress to reach the large length 
scales. This expansion rate defines how rapidly the truth, red line in Fig. ^1 diverges from the best guess 
model trajectory. This is the trajectory starting from the true system state projected onto the model phase 
space. Of course, for the same model state there are infinitely many system states distinguishable only in 
their small length scale structure. Clearly, the accurate shadowing of the truth by the ensemble trajectories 
in Fig. El shows that growth of the small scale uncertainties does not severely limits the model forecasting 
skill. Their small growth rate makes possible the deterministic modeling of the large scale motions. It sets an 
upper bound for the model predictability time, which can be approached by improving the parametrization 
of the small scale dynamics. 

8. Conclusions 

We describe a conceptual framework for modeling and forecasting the evolution of a large strike-slip earth- 
quake fault. The approach relies on the detection of spatio-temporal strain patterns embedded in the observ- 
able surface displacements: no detailed knowledge of the fault geometry, dynamics, or rheology is required. 



Rather than directly modeling the fault dynamics, we propose instead to model the dynamics of observable 
surface deformations, which are nonlinearly related to the original dynamics of the fault system. 

The essence and novelty of the method lies in the discovery that the large length and time scales dynamics 
have a strong low- dimensional, deterministic component and are therefore amenable to representation by a 
deterministic model. We have also found that the large scale motions provide reliable forecasting information 
about the large seismic events. These two fundamental results set the stage for standard data processing 
and model reconstruction techniques. First, we identify the large scale motions with generalized directions 
(spatial modes) along which the dynamics has its largest fluctuations. Finding these directions is the natural 
task of the proper orthogonal decomposition applied to the ensemble of surface deformations generated during 
the evolution of the system. The most dominant spatial modes define the model phase space and provide an 
optimal embedding for the large scale dynamics of the system. Second, the model reconstruction consists in 
finding a nonlinear set of ODEs whose trajectory in model phase space approximates the system trajectory 
projected into the model phase space. This is a learning task that can be successfully accomplished by an 
artificial neural network. 

The method relies on the existence of some separation between small and large length and time scales, and 
the physics responsible for this separation stems from the dynamic weakening used to model static/kinetic 
friction. We argue that in the presence of significant dynamic weakening the large scale dynamics defines the 
low-dimensional backbone of the system's attractor. The small scale dynamics is practically indistinguishable 
from stochastic dynamics and evolves in a small neighborhood of the attractor backbone. Based on this 
geometric picture, we argue that the statistical properties of the small scale motions arc determined by 
the larger-scale motions upon which they are superposed. In turn, the large scale motions depend on the 
statistical properties of the small scales. As pointed out by Lorenz |19) . there remains uncertainties in the 
small scale statistics, and hence in their influence upon the larger scale. The predictability time of the large 
scale motions is controlled by the rapidity with which the small scales uncertainties progress to reach the 
very large scales. This sets an upper bound to the predictability time of any large scale model. Due to the 
robustness shown by the model ensemble forecast, we conclude that the intrinsic predictability time of the 
large scale dynamics is very long, perhaps of the order of 10-20 years. This is ultimately the reason behind 
the deterministic behavior of the large scale motions. 

There are many difficult problems that are currently under investigation. For example, we presently study 
how robust the large scale behavior is to changes in the model parameters. Preliminary results confirm the 
controlling role of the dynamic weakening and show that the deterministic character of the large scale motions 
is robust to large changes in this parameter. We are also studying the effects of correlated heterogeneities, 



smooth brittle to ductile transitions, and continuous transition from static to kinetic friction. We do expect 
that all these effects are averaging out the small scale dynamics and strengthen the deterministic structure 
of the attractor. Probably, the most difficult problems to address arc the construction of good models from 
short data observations and the problem of spatio-temporal chaos. In the last case, dimensions and Lyapunov 
exponents become intensive quantities and we want to understand how the method scales with the size of 
the fault system. The current paper is concerned primarily with developing a methodology and the obtained 
results are preliminary. Continuing studies along the directions of this work may have a significant impact 
on the earthquake predictability problem. 
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Captions 

Fig. 1 Schematic evolution of the stress and slip for an individual fault cell. The parameters that describe the 
static/kinetic friction law are: static strength, t s , dynamic strength, t<j, and arrest stress, r a . The time 
dependent failure strength of the cell, r/, equals the static strength if the cell has not slipped yet and 
is adjusted to its dynamic value when the cell fails (from 3 ). 

Fig. 2 A 30 years evolution of the potency released by two fault systems. Results for system A, with small 
dynamic overshoot D = 1.5 are on the left and for system B, with large dynamic overshoot D = oo 
(and no dynamic weakening) are on the right. System A exhibits quasi-regular seismic cycles associated 
with the large events. There is no detectable structure in the evolution of seismicity in system B. 

Fig. 3 Time evolution of the average slip deficit (left) and its variance (right) for the D = 1.5 (blue line) and 
D = oo (red line) fault systems. 

Fig. 4 Log- log frequency-size histogram plots for the D — 1.5 system (blue squares) and the D — oo system 
(red circles). The earthquake size is measured by its potency and the histograms use equal bins of size 
100 in potency units (0.29mm x Km 2 ). 

Fig. 5 The fraction of false nearest neighbors (FNN) as a function of the embedding dimension m — 1, . . . , 10. 
Compared to model B (D = oo), there is a dramatic decrease in the number of FNN for model A 
(D = 1.5), as the dimension of the embedding phase space is increased. This suggests that a low- 
dimensional deterministic dynamics is a good approximation for the large scale dynamics of the fault. 

Fig. 6 Estimates of the maximal Lyapunov exponent from the slip deficit variance time series data, <r(t), for 
the D = 1.5 model. The logarithm of the stretching factor is computed for three different neighborhood 
sizes, e, and m = 6, 7, 8, 9, 10 embedding dimensions. The robust linear behavior of S( An) reflects the 
underlying determinism of the data and its slope is an estimate of the maximal Lyapunov exponent, 
A mal = 0.8 years -1 (the straight line has slope 0.08 and the unit of time is 0.1 years). 

Fig. 7 The first four POD spatial modes of the u x surface deformation field for fault system A with dynamic 
overshoot D — 1.5. These modes define, in decreasing order of their eigenvalues, generalized deformation 
directions which support most of the variance produced by the dynamics of surface fluctuations. The 
coherent, collective nature of the modes is reflected in their overall shape. 

Fig. 8 Cumulative normalized eigenvalue spectrum for each of the surface deformation modes for system A 
on the left and B on the right - only the first 20 POD modes are shown. In order to explain / = 95% 



of the variance in the u x ,u y ,u z data sets for system A we need only 2, 3, and 5 modes respectively. 
In contrast, for system B we need 9, 27, and 42 modes. Generally, the surface deformations along the 
strike direction, u x , have the most efficient POD description. 

Fig. 9 Modal time series (red lines) describing the evolution of the first four u x spatial modes shown in Fig.0 
This time dependence is determined by projecting the u x surface deformation onto each POD mode 
every 0.1 years. The blue lines describe the time evolution of the cumulative potency released over each 
0.1 year interval. Note the correlations between the time intervals of high potency released and the 
discontinuities present in the temporal modes. 

Fig. 10 Local slopes ^(to, e) of the correlation sum for the multivariate time series describing the evolution of 
systems A (on the left) and B (on the right) for embedding dimensions m = 1, ... , 20. For system A 
(D = 1.5), we detect the emergence of a scaling region around e ss 100, suggesting a correlation dimen- 
sion d,2 — 1-3 and self-similar geometry of the attractor at the large length scales. This behavior breaks 
down when the attractor is probed at smaller length scales where the dynamics is high dimensional. 
Within the same range of embedding dimensions, system B (D = oo) shows no signature of a large 
scale dimension collapse. 

Fig. 11 The temporal autocorrelations of the first three modes of the u x deformation field in D = 1.5 model 
(mode 1 red line, mode 2 blue line, and mode 3 green line). The short-term oscillatory behavior and 
the slow long-term amplitude decay reveal the presence of two correlation time scales. The oscillations 
become faster and the long-term coherence decreases rapidly for the higher order modes. 

Fig. 12 Model ensemble forecast for the evolution of the first u x mode (coordinate A\ In model phase space) 
as predicted by the iterated ANN (blue lines). Each trajectory of the ensemble starts from a different 
initial condition and is integrated forward in time for 250 steps. Each ensemble trajectory evolves into 
the model phase space, while the red dashed line represents the true trajectory of the system projected 
onto the first coordinate of the model phase space (mode one). The ensemble is consistent with an 
arbitrary Gaussian, initial observational uncertainty, but its members do not live on the projection 
of the true system's attractor into the model phase space. Nevertheless, all members of the ensemble 
consistently forecast the incoming sequence of large events, proving the long term forecasting reliability 
We also notice a systematic bias in estimating the time to the next large event. 
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